Skip to content

feat(spanner): add shared endpoint cooldowns for location-aware rerouting#12845

Merged
rahul2393 merged 17 commits intogoogleapis:mainfrom
rahul2393:endpoint-cooldown-re
Apr 20, 2026
Merged

feat(spanner): add shared endpoint cooldowns for location-aware rerouting#12845
rahul2393 merged 17 commits intogoogleapis:mainfrom
rahul2393:endpoint-cooldown-re

Conversation

@rahul2393
Copy link
Copy Markdown
Contributor

@rahul2393 rahul2393 commented Apr 17, 2026

Summary

This PR improves Java Spanner's location-aware bypass routing when routed replicas are overloaded or unavailable, and extends score-based replica selection

The client now:

  • avoids recently overloaded routed endpoints using shared cooldowns
  • records RESOURCE_EXHAUSTED / UNAVAILABLE as EWMA error penalties
  • uses EWMA-based selection for both preferLeader=false and strong preferLeader=true read/query routing when
    operation_uid is available

It also keeps the location-aware read path lock-free via immutable group snapshots.

What changed

  • Added shared channel-level cooldown tracking for routed endpoints that return RESOURCE_EXHAUSTED / UNAVAILABLE, while still keeping request-scoped exclusions for same-logical-request retries.
  • Updated bypass retry behavior so eligible reads/queries can reroute to another replica instead of immediately
    returning to the same failed endpoint.
  • Recorded RESOURCE_EXHAUSTED / UNAVAILABLE as EWMA error penalties for routed replicas, so unhealthy endpoints are deprioritized even after the immediate retry/cooldown window.
  • Extended score-based routing to strong preferLeader=true read/query traffic when operation_uid is present, using leader preference as a bias instead of a hard override.
  • Kept preferLeader=true behavior unchanged for paths without operation_uid such as mutation/commit routing.
  • Refactored KeyRangeCache group state to immutable snapshots and removed per-group synchronization from the routing hot path.

@rahul2393 rahul2393 requested review from a team as code owners April 17, 2026 22:34
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an endpoint cooldown mechanism to handle RESOURCE_EXHAUSTED errors and refactors the KeyRangeCache to use immutable snapshots, replacing per-group locking to improve read performance. The new EndpointOverloadCooldownTracker manages short-lived cooldowns with exponential backoff and jitter, while KeyAwareChannel is updated to exclude endpoints on both RESOURCE_EXHAUSTED and UNAVAILABLE status codes. Feedback is provided to optimize the GroupSnapshot constructor by removing a redundant list copy.

@rahul2393
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements latency-aware routing for Spanner endpoints by introducing a score-based replica selection mechanism using time-decayed EWMA. Key additions include registries for tracking endpoint latency and inflight requests, a cooldown tracker for overloaded endpoints, and updates to the KeyRangeCache to support score-aware selection. Feedback identifies several high-priority issues in the new static registries, including a memory leak in the latency tracker map due to accumulating operation identifiers, potential key collisions between different client instances sharing a JVM, and a race condition when updating inflight request counts. There is also a recommendation to reduce the maximum size of the request ID cache to prevent excessive memory consumption.

@rahul2393
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a score-aware load balancing and rerouting system for Spanner routed endpoints, introducing components like EndpointLatencyRegistry for tracking latencies and inflight requests, and EndpointOverloadCooldownTracker for managing overloaded replicas. The KeyRangeCache is updated to utilize a "Power of Two" selection strategy based on these metrics. Feedback identifies several improvement opportunities: refining cost calculations to use inflight counts even when latency data is absent, ensuring consistency by adding RESOURCE_EXHAUSTED to retryable codes for streaming SQL requests, and adjusting the EWMA decay logic to prevent score resets during near-simultaneous updates.

@rahul2393 rahul2393 force-pushed the endpoint-cooldown-re branch from a5a6665 to 4912eac Compare April 20, 2026 06:21
initialized = true;
lastUpdatedAtNanos = nowNanos;
} else {
double alpha = fixedAlpha != null ? fixedAlpha : calculateTimeBasedAlpha(nowNanos);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it strictly necessary to do this calculation while holding the lock? Or could we move the calculation to outside the lock, and only take the lock when we are reading/writing the lastUpdatedAtNanos?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will be keeping this for now, moving the calculation out would either introduce unsynchronized access or compute alpha from stale state under contention, will do in follow-up if needed

return null;
}

private TabletSnapshot selectScoreAwareTablet(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is hard to read, partly due to the length and partly due to the number of branches. And I think that there are multiple opportunities to optimize it. Now, it potentially loops over the eligibleTablets more than once. It also creates two ArrayLists and a HashMap each time it is invoked. I think that it would be worth taking a second look at this method, considering how often it will be invoked.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored it for readability, will improve in followup

@rahul2393 rahul2393 requested a review from olavloite April 20, 2026 13:21
@rahul2393 rahul2393 enabled auto-merge (squash) April 20, 2026 15:44
@rahul2393 rahul2393 disabled auto-merge April 20, 2026 16:55
@rahul2393 rahul2393 merged commit f5f273b into googleapis:main Apr 20, 2026
128 of 132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants